Searching and replacing

The Find and Replace... command in the View menu allows text, elements, and patterns to be found and replaced.

When you choose this command you will see a dialog box, like the one illustrated below, that allows you to enter various values and options.
FIND

The search and replace operations represented by the buttons at the bottom of the dialog box generally work the way they do in other word-processing applications. Note the following, however:

Choose Find Next to repeat a search using the most recent search string.

Specifying the search and replace strings

The Find text entry box allows you to specify a search string of text characters, elements, or patterns. If the document contains a selection when you choose Find and Replace... the selected text will automatically become the search string. If the selected text is longer than 255 characters, it will be truncated. If the selection contains an element it will be truncated at the last character before the start-tag icon.

The Replace text entry box allows you to specify a replace string consisting of text characters, elements, or patterns with which you want to replace the search string.

The Find In text entry box allows you to restrict your search to a particular element.

The Find, Replace, and Find In strings are described in more detail below.

Search options

There are five search options that you can set. You may want to search forward or backward through the file, match only whole words, match upper- and lower-case exactly, employ wrapping, or perform pattern searching. These options can be used in combination. You can turn Backwards Search on or off by clicking in the check box in the Find & Replace dialog box. The other options can be set by clicking on the [ Options...] button and then clicking in the appropriate check boxes in the dialog box that appears.

Searching for elements

The search and replace strings can both be elements. An open angle bracket, `<', followed by a valid element name matches an element. The angle bracket must be the first thing on the line. If the search succeeds, the insertion point is positioned to the right of the start tag. The name in the search string can optionally be followed by a closing angle bracket (>).

For example,

<P

matches the element P. Element names are not case sensitive in HoTMetaL, so `<p' and `<P' will match the same elements.

In a replacement, if the search string and the replace string are both elements, the element in the search string will be changed to the type specified in the replace string if the HTML rules allow it. The contents of the element will be unchanged.

If the search string matches text (as opposed to an element) and the replace string is an element, the element will be inserted after the found text if the replacement operation is carried out.

Searching for text within an element

The search string can contain both an element name and, following it, some text (or a pattern) that must be matched within the element. In this case the element name must end with a closing angle bracket. For example:

<P>the

would match the word `the' anywhere within the element P. This is similar to the kind of restrictive searching that can be done using the Find In string but it can be used in conjunction with that feature to further restrict the search. In the last example, if the Find In string is set to:

<OL

the word `the' would be matched if it appeared in a paragraph in an OL list but not if it appeared in a paragraph in another context.

An element name in the replace string cannot be followed by text: if it is, an error message will be displayed and the replace operation will not be performed.

Attributes

You can restrict the search to an element with specific attribute values. This is done in the search string by following the element name with a space-separated list consisting of attribute names followed by an equal sign, `=', followed by a value contained in double quotes (" "). For example:

<a name="donkey"

will search only for those A elements whose NAME attribute has the value `donkey'.

You can specify replacement attribute values in the replace string. For example, you could use the following replace string in conjunction with the find string in the previous example:

<a name="burro"

Any attribute values that aren't specified in the replace string will remain unchanged.

Search patterns can be used to specify attribute values. You can specify as many attributes as you wish, and in any order.

Find In

One of HoTMetaL's more powerful search features is its ability to restrict a search to the contents of a particular element type. For example, you could search for a word only when it appears in an EM element.

Use the Find In text entry box to specify the element that you want to restrict searching to. Specify the element in the same way you would in the search string, except that the element name can't be followed by text. Attribute values may also be specified in the Find In string-you can use a Find In string such as:

<li label="Donkeys"

Error messages

If you have a badly-formed search or replace string, HoTMetaL will display an error dialog box giving a description of the error. Errors that will be reported include: invalid attribute or element names; unmatched parentheses and brackets in search patterns; `?', `*', or `+' not preceded by any character; invalid character ranges.

For example, if you use the search pattern:

<QUAGMIRE

you will get the error message:

Find: Invalid element name

because the HTML rules do not allow an element called QUAGMIRE.

Using search patterns

If the Find Patterns option is turned on the characters you type in the Find text entry box are interpreted as patterns by HoTMetaL: that is, the search string can contain certain special search characters that allow the search string to match a class of strings. (If your search string does not contain any special search characters, HoTMetaL will search for exactly the text you have typed.) For example, the search character `.' (period) is used in the following pattern:

m...y

This matches a sequence of five characters beginning with `m' and ending with `y', e.g., the words `money', `marry', `murky', etc.

The following characters are special search characters in a search pattern:

. * ? + ^ $ [ ]

In addition, the character `<' (used to specify an element search) is special when it appears as the first character of the pattern.

To search for any special character as an ordinary character when Find Patterns is turned on, precede it with a backslash. For example:

\.

is used to match a period.

Search patterns may be enclosed within parentheses for grouping.

Matching a single character

Any single character (other than a special character) matches itself in a search pattern. To match a single, arbitrary character, use a period or dot, `.'. This will also match a single blank space. Therefore:

fo.d

would match `food', `ford', `fond', `fold', etc. Similarly,

s.o.

matches `stop', `shot', `snow', etc.

Matching zero or more of something

A single character, or a string enclosed in parentheses, followed by an asterisk, `*', matches zero or more occurrences of that character or string. For example:

l*ama

would match `ama', `lama', `llama', `lllama', etc.

b(an)*a

would match `ba', `bana', `banana', and so on.

You can combine the `*' with `.' to match arbitrary strings of characters. So

s.*ch

matches `search', `such', `stretch', `stopwatch', as well as `sch' and `skip lunch'. This search pattern represents strings that start with `s' followed by zero or more occurrences of an arbitrary single character (it doesn't have to be the same character over and over) followed by the characters `ch'. Since the period can match a blank space, this pattern can match a multi-word string.

Matching one or more of something

A single character, or a string enclosed in parentheses, followed by a plus sign, `+', matches one or more occurrences of that character or string. For example, the following expression matches `ben', `been', `beeen', and so forth, but not `bn'.

be+n

Matching zero or one of something

A single character, or a string enclosed in parentheses, followed by a question mark, `?', matches zero or one occurrences of that character or string. For example, to search for instances of both `color' and `colour', you would use:

colou?r

Either/or searches

If you want to search for either of two search patterns, separate them with a vertical bar, `|'. This will match any string that matches either of the patterns. For example, if you wanted to search for either `love' or `money', you would use the expression:

love|money

You can combine two search patterns:

s.*ch|fo.d

Matching just after a tag

A caret, `^', at the very beginning of a search pattern means that text will match the pattern only if it immediately follows a start- or end-tag. Such text must not be separated from the tag by white space. Anywhere else, the caret is not treated as a special search character (except in character ranges, see below). For example, if you wanted to search for the word `Note' immediately following a tag, you would use:

^Note

Matching just before a tag

A dollar sign, `$', at the very end of a search pattern means that text will match the pattern only if it is immediately followed by a tag. The text must not be separated from the tag by white space. Anywhere else, the dollar sign is not treated as a special search character. For example, if you wanted to search for the word `sub' immediately preceding a tag, you would use:

sub$

Character ranges

A pair of square brackets, `[' and `]', around any string of characters defines a range that matches any one of the characters between the brackets. The simplest case is of this type:

an[dy]

This matches `and' and `any'.

A range of characters of the form

[char1-char2]

matches any character beginning at char1 and ending at char2. For example:

[e-p]

matches any lowercase letter between `e' and `p', inclusive. The pattern:

[A-Za-z]

matches any upper or lower case letter.

[A-Za-z0-9]

matches any alphanumeric character.

A range of characters can be embedded in a longer range. For example, the pattern:

[ac-fh]

matches any of `a', `c' through `f', and `h'.

If searching is not in case-sensitive mode, no distinction between lower case and upper case letters is made in character ranges. In this situation, for example, the character range:

[a-z]

would match any upper- or lower-case letter.

You can reverse the meaning of a character range by preceding it with a caret, `^': this causes it to match any character not in the range. For example:

th[^ei]n

matches `than' but not `then' or `thin'. An expression of the form:

[^char1-char2]

matches any character not in the range of characters beginning at char1 and ending at char2.

Re-using the search string

If you surround a sub-expression in the search string by parentheses, `(' and `)', you can refer in the replace string to whatever this sub-expression matches. In general, an expression in the replace string of the form `\n', where n is a number from 1 to 9, means `replace this expression with whatever the nth expression in brackets in the search string has matched'. For example, if the search string is:

(.)read

and the replace string is:

\1ox

then if the search string matches `bread', the found text will be replaced by `box'. This is because the sub-expression `(.)' matched the letter `b'; the expression `\1' in the replace string means `replace this expression with whatever is matched by the first expression in parentheses in the search string'. Therefore `b' is substituted for `\1' and the replace string becomes `box'.

Here is a more complicated example: suppose the search string is:

(v.*e) (v.*a)

and the replace string is:

\2 \1

Now the search string may match the words `vice versa'. The first sub-expression, `(v.*e)', matches `vice' and the second sub-expression, `(v.*a)', matches `versa'. In the replace string, HoTMetaL replaces `\2' by what the second sub-expression in the search string matched, and replaces `\1' by what the first sub-expression matched. Therefore the replace string becomes `versa vice'. The net effect of the operation is to replace an occurrence of `vice versa' with `versa vice'.

It is possible to nest sub-expressions. In this situation, the sub-expressions are numbered according to the order of occurrence of their left parentheses. For example, if the search string were:

(a(bc)d)

and the replace string:

\2 \1

the effect would be to find `abcd' and replace it by `bc abcd'.

The expression `\0' in a replace string refers to the entire string that was matched by the search string. For example, if the search string were:

fish

and the replace string were:

gone \0ing

then an occurrence of `fish' would be replaced by `gone fishing'.

You can use `\n' expressions in attribute replacement values: one application of this technique is changing the value of a group of URLs in some regular way. (The Publish... command lets you change the scheme for a set of URLs: the `\n' is actually a more general form of this kind of substitution). For example, if you want to change all of the filenames in your A elements to have the `.htm' file extension instead of `.html', you could use the following pattern for the find string:

<a href="(.*)html"

And the replace string:

<a href="\1htm"

The element is matched by `<a'; the attribute that contains the URL value is called HREF; the pattern `(.*)' matches everything in the URL up to the characters `html'; in the replacement, everything this pattern matched is substituted for `\1', and the characters `htm' are appended, thus creating the modified filename.

There's an even simpler way to do this, if you're sure that all the filenames end in `.html'. Use the following find string:

<a href="(.*)l"

And the replace string:

<a href="\1"

In this case, the replacement string will consist of everything the find string matched, except the final letter `l'.

Summary

The following list summarizes the search patterns and special characters available in HoTMetaL's search facility.

ordinary character
itself
<name, <name>
the element name
. (dot)
any single character
x*
0 or more occurrences of the character x
(pattern)*
0 or more occurrences of pattern
x+
1 or more occurrences of the character x
(pattern)+
1 or more occurrences of pattern
x?
0 or 1 occurrences of the character x
(pattern)?
0 or 1 occurrences of pattern
pattern1|pattern2
pattern1 or pattern2
^pattern
pattern immediately following markup
pattern$
pattern immediately preceding markup
[string]
any single character in string
[^string]
any single character not in string
[char1-char2]
any character in the range char1-char2
[^char1-char2]
any character not in the range char1-char2
\n
in a replace string, is replaced by the text matched by the nth subexpression in brackets in the search string
\0
in a replace string, is replaced by the text matched by the entire search string